Crosslingual Countability Classification: English meets Dutch

نویسندگان

  • Leonoor van der Beek
  • Timothy Baldwin
چکیده

This paper presents a range of methods for classifying Dutch nouns as countable, uncountable or plural only based on both Dutch and English data. The classification is based on the occurrence of countability specific linguistic features that are extracted from unannotated corpora. We show that in the absence of reliable Dutch gold standard data, cross-linguistic classification can be achieved on the basis of a word-toword or feature-to-feature mapping between English and Dutch.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Ins and Outs of Dutch noun countability classification

This paper presents a range of methods for classifying Dutch noun countability based on either Dutch or English data. The classification is founded on translational equivalences and the corpus analysis of linguistic features which correlate with particular countability classes. We show that crosslingual classification on the basis of word-to-word or featureto-feature mappings between English an...

متن کامل

Crosslingual Countability Classification with EuroWordNet

We examine the hypothesis that noun countability is consistent for a given word semantics by way of a series of experiments involving EuroWordNet and the English and Dutch languages. The basic method involves determining a default set of countabilities for each EuroWordNet synset based on countability-mapped words in that synset, and testing the match between these countabilities and those of h...

متن کامل

The University of Amsterdam at the CLEF Cross Language Speech Retrieval Track 2007

In this paper we present the contents of the University of Amsterdam submission in the CLEF Cross Language Speech Retrieval 2007 English task. We describe the effects of using character n-grams and field combinations on both monolingual English retrieval, and crosslingual Dutch to English retrieval.

متن کامل

Multilingual Training of Crosslingual Word Embeddings

Crosslingual word embeddings represent lexical items from different languages using the same vector space, enabling crosslingual transfer. Most prior work constructs embeddings for a pair of languages, with English on one side. We investigate methods for building high quality crosslingual word embeddings for many languages in a unified vector space. In this way, we can exploit and combine infor...

متن کامل

Learning the Countability of English Nouns from Corpus Data

This paper describes a method for learning the countability preferences of English nouns from raw text corpora. The method maps the corpus-attested lexico-syntactic properties of each noun onto a feature vector, and uses a suite of memory-based classifiers to predict membership in 4 countability classes. We were able to assign countability to English nouns with a precision of 94.6%.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003